---
title: "Shrid to LGD matching summary"
author: "Pratik Mahajan"
date: "2025-04-08"
output: pdf_document
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```

## Overview

This report documents the process of linking villages in the SHRUG dataset to their corresponding Gram Panchayat (GP) or Urban Local Body (ULB) using official LGD datasets. It flags discrepancies, special administrative situations, and summarizes matching outcomes.

------------------------------------------------------------------------

## Village-to-GP Mapping

On January 8, 2025, I downloaded statewise datasets from the Ministry of Panchayati Raj website (<https://lgdirectory.gov.in/downloadDirectory.do?OWASP_CSRFTOKEN=WJWT-9M8E-LTVJ-6FSR-DUR3-GDPD-0IFE-VQMD>) containing village to GP mapping. These were then merged into a single file.

I then also downloaded the ULB to Village dataset from the website on January 8 2025, containing the associated Urban Local Body of the shrid.

Finally, I downloaded the "Shrug Location Names and Additional Keys" dataset containing the labels for each shrid entry.

Shrid codes in SHRUG were then merged with the LGD village-to-GP mapping dataset file. Some village codes in SHRUG were **not found** in the LGD dataset or had **no associated Gram Panchayat**.

### Discrepancy

In a subset of cases, villages were present in SHRUG but either:

-   Were **missing** from the LGD dataset, or\
-   Were **present** in LGD but the fields for LGD code and GP name were empty (`NA` or blank LGD code field).

These were flagged with labels: - `"Present in Shrug but Missing from LGD dataset"` - `"Place Present in LGD Dataset But Empty Local Body Code Field" respectively`

**The total number of these by state are summarized at the end.**

------------------------------------------------------------------------

## Multi-village Gram Panchayats

Some Gram Panchayats are associated with **multiple villages (called group gram panchayats)**. SHRUG rows tied to such GPs were marked using a variable `shrid_part_of_multi_village_GP`.

This helps account for shared panchayats during village-level analysis.

------------------------------------------------------------------------

## Non-Panchayati Raj States

Certain shrid entries of union territories and states, namely - **Nagaland, Meghalaya, Mizoram, and urban centres of Chandigarh and NCT of Delhi** **do not operate Panchayati Raj systems**. These were flagged as:

> **"State Has No Panchayati Raj"**

Such rows were excluded from LGD match rate calculations.

------------------------------------------------------------------------

## Urban Local Body Integration

Some SHRUG villages had **both** GP and ULB codes — a sign that these villages may have been absorbed into urban jurisdictions after initial GP assignment.

### Handling Overlap

-   Where **both codes** were present, the ULB code was retained as the final LGD code.
-   The original GP code and name were stored in:
    -   `old_gp_lgd_code`
    -   `old_gp_name`
-   A variable `gp_to_urban_conversion` was created to flag these conversions. There were 929 such entries.

------------------------------------------------------------------------

## Summary Table: LGD Matching by State

I summarize the proportion of SHRUG villages in each state that: - Were **Present in Shrug but Missing from LGD dataset** - Were **Place Present in LGD Dataset But Empty Local Body Code Field** - Were **successfully matched** to either a GP or ULB

```{r echo=FALSE, message=FALSE, warning=FALSE}

shrug_LGD<- read.csv("shrug_LGD_matched.csv")
valid_shrug_LGD <- subset(shrug_LGD, local_body_name != "State Has No Panchayati Raj")

# Subsets for status
missing_lgd<- subset(valid_shrug_LGD, local_body_type=="Present in Shrug but Missing from LGD dataset")
empty_field <- subset(valid_shrug_LGD, local_body_type == "Place Present in LGD Dataset But Empty Local Body Code Field")
matched_lgd <- subset(valid_shrug_LGD, !(local_body_type=="Present in Shrug but Missing from LGD dataset" | local_body_type == "Place Present in LGD Dataset But Empty Local Body Code Field"))

# Build per-state percentages
states_all <- sort(unique(valid_shrug_LGD$state_name))

total_counts <- table(valid_shrug_LGD$state_name)[states_all]
missing_counts <- table(factor(missing_lgd$state_name, levels = states_all))
empty_fieldcounts <- table(factor(empty_field$state_name, levels = states_all))
matched_counts <- table(factor(matched_lgd$state_name, levels = states_all))

# Final percentage table
percent_table <- data.frame(
  State_UT_Name = states_all,
  Percent_Missing = round(100 * as.numeric(missing_counts) / as.numeric(total_counts), 2),
  Percent_Empty_Field = round(100 * as.numeric(empty_fieldcounts) / as.numeric(total_counts), 2),
  Percent_LGD_Matched = round(100 * as.numeric(matched_counts) / as.numeric(total_counts), 2)
)

# Overall matching stats
total_missing <- sum(missing_counts)
total_empty_field <- sum(empty_fieldcounts)
total_matched <- sum(matched_counts)
total_valid <- sum(total_counts)

overall_missing_rate <- round(100 * total_missing / total_valid, 2)
overall_empty_field_rate <- round(100 * total_empty_field / total_valid, 2)
overall_matched_rate <- round(100 * total_matched / total_valid, 2)
```

```{r, echo=FALSE}

cat("**Overall Present in SHRUG but Missing from LGD Dataset:**", overall_missing_rate, "%\n\n")
cat("**Overall Present in LGD Dataset but GP Code Missing:**", overall_empty_field_rate, "%\n\n")
cat("**Overall Successfully Matched to LGD (GP or ULB):**", overall_matched_rate, "%\n")

knitr::kable(percent_table, digits = 2, caption = "Percentage of SHRUG Villages Matched to LGD")

```
